Accelerating Federated Learning via Momentum Gradient Descent
نویسندگان
چکیده
منابع مشابه
Accelerating Stochastic Gradient Descent via Online Learning to Sample
Stochastic Gradient Descent (SGD) is one of the most widely used techniques for online optimization in machine learning. In this work, we accelerate SGD by adaptively learning how to sample the most useful training examples at each time step. First, we show that SGD can be used to learn the best possible sampling distribution of an importance sampling estimator. Second, we show that the samplin...
متن کاملAccelerating Stochastic Gradient Descent
There is widespread sentiment that fast gradient methods (e.g. Nesterov’s acceleration, conjugate gradient, heavy ball) are not effective for the purposes of stochastic optimization due to their instability and error accumulation. Numerous works have attempted to quantify these instabilities in the face of either statistical or non-statistical errors (Paige, 1971; Proakis, 1974; Polyak, 1987; G...
متن کاملLearning ReLUs via Gradient Descent
In this paper we study the problem of learning Rectified Linear Units (ReLUs) which are functions of the form x ↦ max(0, ⟨w,x⟩) with w ∈ R denoting the weight vector. We study this problem in the high-dimensional regime where the number of observations are fewer than the dimension of the weight vector. We assume that the weight vector belongs to some closed set (convex or nonconvex) which captu...
متن کاملLearning via Gradient Descent in Sigma
Integrating a gradient-descent learning mechanism at the core of the graphical models upon which the Sigma cognitive architecture/system is built yields learning behaviors that span important forms of both procedural learning (e.g., action and reinforcement learning) and declarative learning (e.g., supervised and unsupervised concept formation), plus several additional forms of learning (e.g., ...
متن کاملVAE Learning via Stein Variational Gradient Descent
A new method for learning variational autoencoders (VAEs) is developed, based on Stein variational gradient descent. A key advantage of this approach is that one need not make parametric assumptions about the form of the encoder distribution. Performance is further enhanced by integrating the proposed encoder with importance sampling. Excellent performance is demonstrated across multiple unsupe...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Parallel and Distributed Systems
سال: 2020
ISSN: 1045-9219,1558-2183,2161-9883
DOI: 10.1109/tpds.2020.2975189